Variation in clinical target volume delineation in postoperative radiotherapy for biliary tract cancer

We aimed to evaluate the inter-clinician variability in the clinical target volume (CTV) for postoperative radiotherapy (PORT) for biliary tract cancer (BTC) including extrahepatic bile duct cancer (EBDC) and gallbladder cancer (GBC). Nine experienced radiation oncologists delineated PORT CTVs for distal EBDC (pT2N1), proximal EBDC (pT2bN1) and GBC (pT2bN1) patients. The expectation maximization algorithm for Simultaneous Truth and Performance Level Estimation (STAPLE) was used to quantify expert agreements. We generated volumes with a confidence level of 80% to compare the maximum distance to each CTV in six directions. The degree of agreement was moderate; overall kappa values were 0.573 for distal EBDC, 0.513 for proximal EBDC, and 0.511 for GBC. In the distal EBDC, a larger variation was noted in the right, post, and inferior direction. In the proximal EBDC, all borders except the right and left direction showed a larger variation. In the GBC, a larger variation was found in the anterior, posterior, and inferior direction. The posterior and inferior borders were the common area having discrepancies, associated with the insufficient coverage of the paraaortic node. A consensus guideline is needed to reduce inter-clinician variability in the CTVs and adequate coverage of regional lymph node area.


Introduction
Biliary tract cancer (BTC) arises in the bile duct system being surrounded by numerous critical organs and major vessels. The anatomical location and spreading pattern make a complete a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 resection a complex procedure. Jarnagin et al. [1] reported that only a half of patients who were considered to be resectable received a potentially curative resection. For advanced BTC patients who are not suitable for curative surgery, systemic treatments consisting of cisplatin plus gemcitabine are preferred [2].
Significant differences in survival are found according to the resection status. The 5-year survival rates are up to 50% with a complete resection, but decrease to as low as 0% with an incomplete resection or without resection [3][4][5][6][7]. To improve prognosis of advanced BTC, immunotherapeutic agents can be added to the first-line treatment [8]. Also, optimal secondline systemic treatments after progression of advanced BTC are under exploring, such as adding oxaliplatin or liposomal irinotecan to fluorouracil and folinic acid [9,10].
Regarding the patterns of failure, locoregional failure (LRF) has been reported to occur as the first recurrence. In extrahepatic bile duct cancer (EBDC) patients undergoing curative resection without adjuvant RT, about 40% of patients experienced LRF [11][12][13]. Similarly, gallbladder cancer (GBC) patients have been reported that LRF occurred in 30-40% of patients as an initial failure, although the incidence of distant failure was relatively higher than EBDC patients [14,15].
Considering the patterns of failure, it is a rational strategy to add radiotherapy (RT) to reduce LRF. Owing to the rareness of BTC, accounting for 3% of malignancies in the gastrointestinal (GI) system [16], there are no randomized controlled trials confirming the benefit of postoperative RT (PORT). Although the evidence of PORT is not clear, BTC patients with unfavorable risk factors were recommended to undergo PORT concomitantly with fluoropyrimidine-based chemotherapy [17].
However, clinical target volume (CTV) guidelines for PORT have not yet been established in BTC. There are a few reports proposing contouring guidelines for upper abdominal normal organs [18] or CTVs of definitive RT in patients with locally advanced BTC [19]. Physicians inevitably extrapolate from the CTV guideline for PORT of pancreatic head cancer [20]. Consequently, there can be variations in the CTVs among radiation oncologists. In the current study, we aimed to evaluate the inter-physician variability in the CTV contouring for resected BTC.

Materials and methods
Medical records of patients who underwent curative resection followed by PORT between 2016 and 2021 for distal EBDC (dEBDC), proximal EBDC (pEBDC), and GBC were reviewed. We found thirteen dEBDC patients undergoing pylorus-preserving pancreaticoduodenectomy for dEBDC, twelve pEBDC patients undergoing hepatobiliary resection, and seven GBC patients undergoing extended cholecystectomy. Then we selected those patients pathologically diagnosed with LN-positive and resection margin-negative diseases. Patients with involved resection margin were excluded to minimize viability resulted from the consideration about microscopic tumor extension. The number of patients who met these criteria was six for dEBDC, three for pEBDC, and five for GBC. Finally three BTC patients representing three tumor sites were selected, who had well-localized primary tumor and better image quality in computed tomography (CT) scans. Table 1 details three BTC patients who underwent curative resection followed by PORT. Patient 1 underwent pylorus-preserving pancreaticoduodenectomy for dEBDC. The primary tumor invaded the bile duct wall with a depth of 5-12mm (pT2), and lymph node (LN) metastases were confirmed in two out of 11 LNs (pN1). Patients 2 received extended hemihepatectomy for pEBDC. The primary tumor invaded adjacent hepatic parenchyma (pT2b), and one out of seven LNs was found to be involved (pN1). Patient 3 had laparoscopic extended cholecystectomy for GBC. The primary tumor invaded perimuscular connective tissue on the hepatic side without extension into the liver (pT2b), and metastases were confirmed in one out of two LNs (pN1). Pathologic TNM staging was based on the 8th edition of the American Joint Committee on Cancer [21].
The institutional review board approved this study (approval number: EUMC 2021-03-016), and waived for an informed consent. Medical history, preoperative abdomino-pelvic CTs, and pathologic reports were provided to nine radiation oncologists from nine institutions. All the radiation oncologists trained in the same institution, and have careers in radiation oncology for four to fourteen years. The radiation oncologists were asked to delineate CTVs on the free-breathing CT scans for RT planning, including primary tumor bed and regional LNs. The nine clinicians did not share protocols for BTC contouring or discuss about CTV delineation during the study period. The information of clinicians who contoured CTVs were masked, then CTVs were collected and analyzed.
We implemented data analysis to statistically verify the consensus among the CTVs delineated by the nine clinicians for each case. First, we calculated volumes of each CTV for each patient and minimum, maximum, mean, standard deviation, intersection, and union volume of CTVs per case using the "computational environment radiotherapy research (CERR)" [22]. The recent version of CERR contains a consensus tool for more probabilistic and quantitative analysis such as apparent agreement, kappa corrected agreement, and "simultaneous truth and performance level estimation (STAPLE)" based probabilities as well as sensitivity, specificity, and overall kappa value.
If the 'true contour' is assumed to be within the CTVs delineated by the nine clinicians, it might be between the intersection and union volume. Then, for each voxel in a union volume, it is possible to estimate the proportion of how many clinicians included the voxel. We can define the agreed volume based on a confidence level (CL); CL = 0% becomes a union, and CL = 100% becomes an intersection volume, theoretically. This factor is determined as an "apparent agreement" in the CERR. However, coincidence can be involved in the evaluation of the apparent agreement. Therefore, generalized kappa needs to be applied to evaluate the consistency regarding voxels being included in the target by chance. This concept is expressed in the following way: Kappa ¼ ðApparent agreement À Chance agreementÞ ð1 À Chance agreementÞ , where Chance_agreement is the expected agreement by chance alone and is based on marginal totals [23]. When the apparent agreement is corrected using the kappa value, it generally has less agreement than the apparent agreement. The overall kappa value is often used as a single metric evaluation factor of concordance in target volume consensus studies. According to the value, the overall kappa is classified as follows: 0 indicates no agreement, 0-0.2 indicates slight agreement, 0.21-0.4 indicates fair, 0.41-0.6 indicates moderate, 0.61-0.8 indicates substantial, and more than 0.81 indicates excellent agreement [23].
STAPLE is known as the expectation maximization algorithm, which can decide the 'true contours' by optimizing sensitivity and specificity parameters of all contours per case [24]. The user can define the desired 'true contours' by adjusting CL. In this study, we generated CTV to have a CL of 80% (CTV80), which was used as the benchmark.
In addition, we computed borders of nine CTV and CTV80 per case in six directions (leftright (LR), anterior-posterior (AP), and superior-inferior (SI)). Then we calculated the discrepancy between the CTV80 and each CTV in six directions. We used in-house software for the analysis written in MATLAB (MathWorks, Natick, MA).
Another standard method to verify consistency is a conformity index (CI). We adopted generalized CI (CI gen ) proposed by Kouwenhoven et al. to evaluate CI regardless of the number of observers without bias [25]. CI gen is conceptually expressed as: where A i and A j stand for i'th and j'th clinician's contour. Generally, if it is less than 0.5, the correlation is estimated as low, and if it is 0.7 or more, it is considered suitable.

Results
A total of 27 CTVs for dEBDC, pEBDC, and GBC delineated by nine clinicians were analyzed ( Table 2). The mean and standard deviation of CTVs were 120.62 ± 40.98 cm 3 for dEBDC, 152.05 ± 54.84 cm 3 for pEBDC, and 131.94 ± 46.93 cm 3 for GBC. The degree of agreement was moderate in all cases; overall kappa values were 0.573 for dEBDC, 0.513 for pEBDC, and 0.511 for GBC. The CI gen values were less than 0.5, and all cases showed weak correlation. CTV80 was generated and overlapped with the delineated CTVs in Fig 1. The differences in six directions borders between CTVs and CTV80 were plotted in Fig 2. In the dEBDC case, a relatively larger variation was noted in the right, post, and inferior directions. In the pEBDC case, all borders except the right and left directions showed larger variations. In the GBC case, a relatively larger variation was found in the anterior, posterior, and inferior borders. The posterior and inferior borders were the common areas showing larger discrepancies in all BTC cases.

Discussion
The aim of PORT is to irradiate risky areas for LRF sufficiently while minimizing dose to the normal organs. With the introduction of intensity modulated RT (IMRT), more precise dose distribution can be achieved via accurate RT planning. Although the standardization of delineation guidelines is necessary to accompany the technical development, CTV guidelines for PORT of BTC have not yet been established. At this time, to the best of our knowledge, there are no studies reporting the variability among radiation oncologists in the CTV delineation for BTC patients undergoing curative resection. This is the first study presenting the large variation in the CTV delineation among experienced radiation oncologists.
CTV should encompass regional LNs as well as primary tumor beds. According to the surgical series analyzing LN involvement [26][27][28][29], the LN stations with a high risk of involvement for BTC are the hepatoduodenal ligament (HDL), celiac axis (CA), superior mesenteric artery (SMA), anterior pancreaticoduodenal nodes (aPDN), posterior pancreaticoduodenal nodes (pPDN), and paraaortic nodes (PAN). Notwithstanding the objective data from the pathological-surgical studies, several geographic misses could be found in the CTVs. Socha et al. [30] reviewed PORT studies for BTC, and classified which LNs were included in the CTVs. Compared with pathological-surgical data, the PAN was frequently missed in the CTVs for BTC. Other LNs at risk were also hardly included; SMA in dEBDC and GBC, aPDN in dEBDC, and pPDN in GBC.
In the present study, we could find similar geographic misses in the CTVs delineated by experienced clinicians (Table 3). First, the anterior border showed a relatively wider variation in pEBDC and GBC. The pPDN and SMA would be corresponding to the aforementioned area. More importantly, the posterior and inferior borders were confirmed to have prominent discrepancies in all cases. The LN station associated with this area is the PAN. Anatomically, the PAN encompasses LNs in the left latero-aortic and inter-aortico-venous regions as well as the anterior and posterior regions of the aorta [31]. For the delineation of the PAN, the Radiation Therapy Oncology Group recommends to use an asymmetric expansion from the contour of the aorta. The upper and lower limits are from the most cephalad to the CA to the bottom of the L2 vertebral body, although defined for pancreatic cancer [20]. In our study, however, participant experienced radiation oncologists frequently omitted the LNs in the posterior to the aorta or the inter-aortico-venous regions. The inferior margin of the PAN showed inconsistency, either (Fig 1).
However, PAN involvement is classified as distant metastasis (M1) in the TNM staging system for dEBDC and GBC [21,32], and treatment outcomes are poorer in patients with PAN metastasis, comparing with other regional LN involvement. Furthermore, the surgical resection was generally accepted as a contraindication in BTC patients with PAN metastasis, as it is the last station of lymphatic pathways drained from the biliary system [33,34]. In the meanwhile, several surgical series have been reported that extended lymphadenectomy could be helpful to improve prognosis in BTC patients with PAN metastasis [26,35,36]. Given the PAN is one of the most risky areas with 7-25% of BTC patients having the PAN involvement [30] further studies are needed to establish the optimal management for these patients. In addition, CTVs based on the risk of involvement of PAN in BTC patients according to the tumor locations and stages needs to be suggested, if PORT is given. Although a consensus guideline for CTV delineation is an important key to reduce variability in CTVs, the marked anatomical variation after curative resection is the frustration to visualize as an atlas. Instead, we can adopt the recommendations for PORT for pancreatic cancer [20] or definitive RT for BTC [19]. These atlases recommend adding margins of 1.0-3.0 cm from the major vessels, such as the aorta, portal vein, CA, and SMA. After the expansion, the primary tumor bed considering individual disease extent will be added in the CTVs. Similarly, studies analyzing the patterns of failure visualize LRF sites by mapping around the reference vessels [11,30,37,38]. Therefore, the CTV delineation based on the expansion from the key vessels would be helpful to decrease inter/intra-clinician variability.
Adopting artificial intelligence (AI) can be a way to the future of PORT, besides referring key vessels to delineate CTVs. In recent years, AI is actively introduced into every steps of RT from deformable image registration to treatment planning and quality assurance [39]. Clinical feasibility of automatic delineation of CTVs has been evaluated in the setting of PORT [40,41]. The assistance of AI is expected to reduce inter-clinician variability and contouring time for PORT [42]. In the future, we may leap the inevitable hurdle to study BTC, rareness, with the development of AI based on the collection and analysis of big data [43]. Limitations of the present study are that we focused on the CTV delineation in this study, while other practical issues were not included, such as the PTV margin, RT technique, or dose prescription. These issues might be clues to investigate the inter-clinician variability. More importantly, CTV was not separately delineated as CTV for the primary tumor bed and CTV for regional LNs. The observed variations reflected discrepancies in the CTV for the primary tumor bed as well as for regional LNs. However, the present study is meaningful as it discusses the wide inter-clinician variability in the CTVs for BTC.

Conclusions
Experienced radiation oncologists showed a moderate agreement in the delineation of CTVs for BTC. Among the six directions, the prominent variation was found in the posterior and inferior borders, associated with the PAN. The consensus guideline is needed to reduce interclinician variability in real-world practices.